Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents
نویسندگان
چکیده
Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names.
منابع مشابه
Continuous word representation using neural networks for proper name retrieval from diachronic documents
Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. One approach for increasing the vocabulary coverage of a speech transcription system is to automatically retrieve new proper names from contemporary diachronic text documents. In recent years, neural network...
متن کاملProper name retrieval from diachronic documents for automatic speech transcription using lexical and temporal context
Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assump...
متن کاملHow Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...
متن کاملInvestigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval
Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model. Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...
متن کاملNeural Network Model of System for Information Retrieval from Text Documents in Slovak Language
The aim of the paper is to describe the information retrieval model which retrieves the information from the text documents in Slovak language and which, for this purpose, uses the neural networks. This model comes from linguistic and conceptual approach for the analysis of text documents in Slovak language. The neural network model, based on multilayer perceptron and spreading activation netwo...
متن کامل